智能论文笔记

A general locomotion control framework for serially connected multi-legged robots

Baxi Chong , Yasemin O. Aydin , Jennifer M. Rieser , Guillaume Sartoretti , Tianyu Wang , Julian Whitman , Abdul Kaba , Enes Aydin , Ciera McFarland , Howie Choset

分类：机器人

2021-12-01

串联连接的机器人是希望在大规模灾害中的搜索和救援等限制空间中执行任务的候选人。这种机器人通常是韧带，我们假设肢体的添加可以改善移动性。然而，在设计和控制这种装置方面的挑战在于以提高移动性的方式协调高维冗余模块。在这里，我们开发了一个控制串联连接的多腿机器人的一般框架。具体地，我们结合了两种方法来构建一般的形状控制方案，其可以为各种机器人形态的有效运动提供自变形（“Gaits”）的基线模式。首先，我们从维度降低和生物步态分类方案中获取灵感，以产生身体变形和脚提升/降低的循环模式，其促进了任意基板接触图案的产生。其次，我们使用几何力学方法来促进识别这些起伏的最佳相位，以最大化速度和/或稳定性。我们的方案允许在扁平摩擦地形上的多腿机器人机车上的有效Gaits开发有多种数量的四肢（4,6,16，甚至0四肢）和身体致动能力（包括在Limbless设备上的侧壁Gaits）。通过适当协调身体波动和腿部放置，我们的框架结合了Limbless机器人（模块化）和腿机器人（移动性）的优势。我们预计我们的框架可以提供一般的控制方案，以便快速部署一般的多腿机器人，铺平往达在现实条件下遍历复杂环境的机器的方式。

translated by 谷歌翻译

Causal Deep Learning: Causal Capsules and Tensor Transformers

M. Alex O. Vasilescu

分类：机器学习 | 计算机视觉

2023-01-01

We derive a set of causal deep neural networks whose architectures are a consequence of tensor (multilinear) factor analysis. Forward causal questions are addressed with a neural network architecture composed of causal capsules and a tensor transformer. The former estimate a set of latent variables that represent the causal factors, and the latter governs their interaction. Causal capsules and tensor transformers may be implemented using shallow autoencoders, but for a scalable architecture we employ block algebra and derive a deep neural network composed of a hierarchy of autoencoders. An interleaved kernel hierarchy preprocesses the data resulting in a hierarchy of kernel tensor factor models. Inverse causal questions are addressed with a neural network that implements multilinear projection and estimates the causes of effects. As an alternative to aggressive bottleneck dimension reduction or regularized regression that may camouflage an inherently underdetermined inverse problem, we prescribe modeling different aspects of the mechanism of data formation with piecewise tensor models whose multilinear projections are well-defined and produce multiple candidate solutions. Our forward and inverse neural network architectures are suitable for asynchronous parallel computation.

translated by 谷歌翻译

Measuring and Estimating Key Quality Indicators in Cloud Gaming services

Carlos Baena , O. S. Peñaherrera-Pulla , Raquel Barco , Sergio Fortes

分类：机器学习

2022-12-28

User equipment is one of the main bottlenecks facing the gaming industry nowadays. The extremely realistic games which are currently available trigger high computational requirements of the user devices to run games. As a consequence, the game industry has proposed the concept of Cloud Gaming, a paradigm that improves gaming experience in reduced hardware devices. To this end, games are hosted on remote servers, relegating users' devices to play only the role of a peripheral for interacting with the game. However, this paradigm overloads the communication links connecting the users with the cloud. Therefore, service experience becomes highly dependent on network connectivity. To overcome this, Cloud Gaming will be boosted by the promised performance of 5G and future 6G networks, together with the flexibility provided by mobility in multi-RAT scenarios, such as WiFi. In this scope, the present work proposes a framework for measuring and estimating the main E2E metrics of the Cloud Gaming service, namely KQIs. In addition, different machine learning techniques are assessed for predicting KQIs related to Cloud Gaming user's experience. To this end, the main key quality indicators (KQIs) of the service such as input lag, freeze percent or perceived video frame rate are collected in a real environment. Based on these, results show that machine learning techniques provide a good estimation of these indicators solely from network-based metrics. This is considered a valuable asset to guide the delivery of Cloud Gaming services through cellular communications networks even without access to the user's device, as it is expected for telecom operators.

translated by 谷歌翻译

The choice of scaling technique matters for classification performance

Lucas B. V. de Amorim , George D. C. Cavalcanti , Rafael M. O. Cruz

分类：机器学习

2022-12-23

Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model's sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.\footnote{https://github.com/amorimlb/scaling\_matters}

translated by 谷歌翻译

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu , Julian Martin Eisenschlos , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Wenhu Chen , Nigel Collier , Yasemin Altun

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-20

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

translated by 谷歌翻译

Benchmarking person re-identification datasets and approaches for practical real-world implementations

Jose Huaman , Felix O. Sumari , Luigy Machaca , Esteban Clua , Joris Guerin

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-20

Recently, Person Re-Identification (Re-ID) has received a lot of attention. Large datasets containing labeled images of various individuals have been released, allowing researchers to develop and test many successful approaches. However, when such Re-ID models are deployed in new cities or environments, the task of searching for people within a network of security cameras is likely to face an important domain shift, thus resulting in decreased performance. Indeed, while most public datasets were collected in a limited geographic area, images from a new city present different features (e.g., people's ethnicity and clothing style, weather, architecture, etc.). In addition, the whole frames of the video streams must be converted into cropped images of people using pedestrian detection models, which behave differently from the human annotators who created the dataset used for training. To better understand the extent of this issue, this paper introduces a complete methodology to evaluate Re-ID approaches and training datasets with respect to their suitability for unsupervised deployment for live operations. This method is used to benchmark four Re-ID approaches on three datasets, providing insight and guidelines that can help to design better Re-ID pipelines in the future.

translated by 谷歌翻译

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Fangyu Liu , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Yasemin Altun , Nigel Collier , Julian Martin Eisenschlos

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-19

Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.

translated by 谷歌翻译

Neural Story Planning

Anbang Ye , Christopher Cui , Taiwei Shi , Mark O. Riedl

分类：自然语言处理 | 人工智能

2022-12-16

Automated plot generation is the challenge of generating a sequence of events that will be perceived by readers as the plot of a coherent story. Traditional symbolic planners plan a story from a goal state and guarantee logical causal plot coherence but rely on a library of hand-crafted actions with their preconditions and effects. This closed world setting limits the length and diversity of what symbolic planners can generate. On the other hand, pre-trained neural language models can generate stories with great diversity, while being generally incapable of ending a story in a specified manner and can have trouble maintaining coherence. In this paper, we present an approach to story plot generation that unifies causal planning with neural language models. We propose to use commonsense knowledge extracted from large language models to recursively expand a story plot in a backward chaining fashion. Specifically, our system infers the preconditions for events in the story and then events that will cause those conditions to become true. We performed automatic evaluation to measure narrative coherence as indicated by the ability to answer questions about whether different events in the story are causally related to other events. Results indicate that our proposed method produces more coherent plotlines than several strong baselines.

translated by 谷歌翻译

Multi-Agent Patrolling with Battery Constraints through Deep Reinforcement Learning

Chenhao Tong , Aaron Harwood , Maria A. Rodriguez , Richard O. Sinnott

分类：人工智能 | 机器学习 | 机器人

2022-12-16

Autonomous vehicles are suited for continuous area patrolling problems. However, finding an optimal patrolling strategy can be challenging for many reasons. Firstly, patrolling environments are often complex and can include unknown and evolving environmental factors. Secondly, autonomous vehicles can have failures or hardware constraints such as limited battery lives. Importantly, patrolling large areas often requires multiple agents that need to collectively coordinate their actions. In this work, we consider these limitations and propose an approach based on a distributed, model-free deep reinforcement learning based multi-agent patrolling strategy. In this approach, agents make decisions locally based on their own environmental observations and on shared information. In addition, agents are trained to automatically recharge themselves when required to support continuous collective patrolling. A homogeneous multi-agent architecture is proposed, where all patrolling agents have an identical policy. This architecture provides a robust patrolling system that can tolerate agent failures and allow supplementary agents to be added to replace failed agents or to increase the overall patrol performance. This performance is validated through experiments from multiple perspectives, including the overall patrol performance, the efficiency of the battery recharging strategy, the overall robustness of the system, and the agents' ability to adapt to environment dynamics.

translated by 谷歌翻译

How to select an objective function using information theory

Timothy O. Hodson , Thomas M. Over , Tyler J. Smith , Lucy M. Marshall

分类：机器学习

2022-12-10

Science tests competing theories or models by evaluating the similarity of their predictions against observational experience. Thus, how we measure similarity fundamentally determines what we learn. In machine learning and scientific modeling, similarity metrics are used as objective functions. A classic example being mean squared error, which is the optimal measure of similarity when errors are normally distributed and independent and identically distributed (iid). In many cases, however, the error distribution is neither normal nor iid, so it is left to the scientist to determine an appropriate objective. Here, we review how information theory can guide that selection, then demonstrate the approach with a simple hydrologic model.

translated by 谷歌翻译